Sequential intrahost evolution and onward transmission of SARS-CoV-2 variants

Persistent severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections have been reported in immune-compromised individuals and people undergoing immune-modulatory treatments. Although intrahost evolution has been documented, direct evidence of subsequent transmission and continued stepwise adaptation is lacking. Here we describe sequential persistent SARS-CoV-2 infections in three individuals that led to the emergence, forward transmission, and continued evolution of a new Omicron sublineage, BA.1.23, over an eight-month period. The initially transmitted BA.1.23 variant encoded seven additional amino acid substitutions within the spike protein (E96D, R346T, L455W, K458M, A484V, H681R, A688V), and displayed substantial resistance to neutralization by sera from boosted and/or Omicron BA.1-infected study participants. Subsequent continued BA.1.23 replication resulted in additional substitutions in the spike protein (S254F, N448S, F456L, M458K, F981L, S982L) as well as in five other virus proteins. Our findings demonstrate not only that the Omicron BA.1 lineage can diverge further from its already exceptionally mutated genome but also that patients with persistent infections can transmit these viral variants. Thus, there is, an urgent need to implement strategies to prevent prolonged SARS-CoV-2 replication and to limit the spread of newly emerging, neutralization-resistant variants in vulnerable patients.

and then look at changes in haplotype frequencies over time.
One critical comment is about the availability of the sequence data. According to the Data Availability Statement, only nasal swab consensus samples have been made available (on GISAID). I assume these include NP and AN samples? For reproducibility, the deep sequencing data also need to be made available, and the accession numbers of the sequences and the SRA project numbers need to also be provided.
Minor comments: Lines 93-106: it's unclear to me which of these identified mutations were at the consensus level versus fixed vs present at low frequencies. I think, given the reference to Figure 1, that this analysis is all consensus level. Line 103: The conclusion that 'two different viral populations emerged' is confusing to me if all the analyses are at the consensus level -having those three different substitutions at the consensus level doesn't necessarily indicate to me that two different viral populations emerged. To conclude this, it seems that below-the-consensus analyses would have to be done (with haplotype reconstruction). Lines 104-106 indicate that there were nonsynonymous (+1 synonymous) mutations outside of spike, but those results I don't see in Figure 1. I am trying to wrap my head around Figure 1… These are SNVs relative to which reference? If a position goes from blue line to red line (e.g. E96D), what does that mean? Are all unlabeled SNVs synonymous and labeled nonsynonymous (I don't think this is the case, but if that's not the case, is there information in this figure that denotes which type the SNV is?) What does a bold line signify vs an unbolded line? Day 48, FCS region: is there an error in plotting, i.e. should the red line be under the red line for day 40? Based on Figure 2A, it is an error in plotting. Lines 88-91: 'Some of these escape mutations in BA.1.23 are now signature substitutions in emerging Omicron lineages such as BA.2.75.2, indicating that persistent viral replication in the context of suboptimal immune responses is an important driver of SARS-CoV-2 diversification' This statement I think is an overreach. How do the observed convergent mutations between BA.1.23 and BA.2.75.2 indicate that 'persistent viral replication in the context… is an important driver of SARS-CoV-2 diversification'? Line 112: E484V (rather than A484V?)-at least according to Figure 1. Line 114: 'Although viral isolation failed for specimens available from the index case..' I don't understand. Wasn't the virus isolated from P1? Lines 116-117: Here, it seems that the forward transmissions were inferred based only on the common nonsynonymous substitutions found in spike. Why present this work this way, rather than by presenting Figure 2A first (whole-genome analysis, and using both nonsynonymous and synonymous variation)? Please see my main comment about reorganization. Line 120: could you clarify how differing in age and gender from the 4 cases (P1-P4) provides an indication that there was independent but limited community spread of this subvariant? Mapping both synonymous and nonsynonymous substitutions on to the phylogeny (current Figure 2A; or a time-aligned version of this phylogeny) would be helpful Figure 2B: Are mSNV frequencies also potentially available for the GISAID sequences? (I.e., are there short read data available in the SRA that correspond to these GISAID consensus sequences?) Figure 2B: Rather than red and blue for in spike vs outside of spike, it would be more helpful to color by nonsynonymous vs synonymous (vertical lines could denote spike region) Instead of introducing the term 'mSNV', why not called it an 'iSNV' (e.g., McCrone et al. eLife)? Line 185: how do we see this in Figure 2A?
In sum, this is an important case study that documents limited forward transmission of a highly divergent SARS-CoV-2 lineage that evolved in a chronically infected individual. The impact and interpretability of this work could be considerably improved from a restructuring of the manuscript. Beyond this, there are several other major comments (above) that, if addressed, would strengthen the manuscript.
Reviewer #2: Remarks to the Author: Severely immunocompromised patients are at risk for severe and prolonged SARS-CoV-2 infection, and as a result, are an important potential source of viral mutation and development of variants. The authors characterize genetic mutations that occur over time, and attempt to demonstrate forward spread, which would be an important contribution to the field. However, there lacks data the clearly supports transmission of novel strains. Additionally, clinical details provided could be further clarified and link between antivirals and emergency of mutations should not be overstated.
Results, page 5: Authors characterize amino acid substitutions that developed in SARS-CoV-2 strain of patient 1. What is the baseline rate of amino acid substitutions of SARS-CoV-2 to help determine if persistent infection in an immunocompromised host is driver of evolution?
Results, page 5: The authors reference isolating variants with shared amino acid substitutions, all of whom were heme malignancy patients. Was actual cluster analysis (time and location) performed to help confirm hospital transmission? If not, do not have enough data to state this is an outbreak.
Results, page 7-8: Patient received non-EUA-approved courses of therapies: e.g. 3-4 weeks of Paxlovid. What was the route of obtaining these therapies? Results, page 8: Authors make multiple states about therapies received and subsequent detection of mutations or lack of persistent infection that may not be causative.
Discussion, page 11: Provide reference for low antibody levels being risk factor for immune escape mutations Antivirals are were not studies looking at outcome of "eliminating persistent infection" but rather for reducing severity of infection. In Figure 3, it appears that cycle threshold of patients did increase (lower viral load) after receipt of antivirals?
Authors use the term "fully vaccinated" but do not provide details on which vaccines (we know that mRNA vaccines are more immunogenic than adenovirus vector, for example) and how many doses.
Reviewer #3: Remarks to the Author: Gonzalez-Reiche et al analysed SARS-CoV-2 persistent infections in immunocompromised patients. They described the emergence, transmission and subsequent evolution of the new Omicron sub lineage BA.1.23 in patients with persistent SARS-CoV-2 infection and replication. They observed that the initial substitutions were within the spike but continued replication led to substitutions in other viral proteins. The authors also showed that BA.1.23 variant was more resistant to neutralising antibodies induced post-booster vaccination or after BA.1 breakthrough infection compared to BA.1 and the ancestral Wuhan strain.
Understanding how SARS-CoV-2 variants emerge and further evolve in immunosuppressed hosts is crucial to develop strategies to treat these individuals in order to prevent the emergence of variants. This paper is highly relevant. The methodology looks appropriate to me. The paper is well written and the figures are very clear. I only have a few comments. Maybe a few additional points belowmentioned could be added to the discussion or clarified.
Major Comments: 1) Line 113: when the authors speak about transmission, do they mean that these patients had some contact(s) at the hospital ? Are there any evidences they were in the same unit in the same time ?
2) Line 117: do the authors mean B-cell and T-cell deficiencies in all patients ? Or only B-cell deficiencies ? 3) Lines 181-182: did the authors try to discriminate vaccine-induced and infection-induced antibodies by measuring anti-N ? Even though I presume IVIg may also bind to N. More globally, I was wondering whether the authors compared both anti-S and anti-N in comparison to see if residual antibodies of different specificities were measured. In addition, I don't see Figure 2C mentioned in the text. It is probably 3B. 4) Line 198 and line 205: the data suggest that a suboptimal level of antibodies may lead to a selective pressure. Do the authors mean these antibodies were not functional (neutralising, Fc receptor function) ? I was wondering whether it was only related to the magnitude or also a lack of functionality. In addition, related to the use of MAb treatment, is there a way to play with the dose and/or length of treatment to make sure the benefit outweighs the risk of selective pressure ? 5) Did the authors analyse if currently available MAb-based therapies could neutralise BA.1.23 ? 6) Line 262: Is there any role of T cells in the intrahost evolution ? Can selection be also driven by T cell escape ? Or is it only related to antibody response ? Do the authors have any T cell data in these patients ? 7) Are there any forms of immunosuppression which are more susceptible to lead to the emergence of variants and mutations leading to escape patterns ?
Minor comments: Figure 3: legends B and C were inverted Line 383: typo "different time points" Extended data Figure  The epidemiological contexts linking the index case to the other individuals should be stated. In the relevant Results sub-section (lines 108-128), the authors describe forward transmission from the index case (P1) to five other individuals, comprising three immunocompromised patients (P2 P3, P4) and two community members (GISAID S1 and S2), based on the shared presence of a unique combination of 7 spike protein mutations. Is there available information on how the other individuals came into contact with the index case, such as whether the three patients shared the same ward? Did the two additional individuals from the NYC area come into contact with the index case or is there a possibility they were contaminated elsewhere? This would help the reader understand the transmissibility of the BA.1.23 variant.
Line 214-215 Why were these 7 mutations selected to be the defining mutations of BA.1.23 out of all detected mutations, not including minority variants (Figure 1)?
Line 203-204. P4 had low levels of antibody titres at the time of the first positive PCR although they were unvaccinated. Could this be an indication of a previous Covid infection? Are past infection histories of the other cases known?

Minor comments
Line 45-46. I find this sentence to be misleading as it sets up the reader to imagine forward transmissions from three index cases. While it is true the study includes three individuals with persistent infections, the fact that there was only one source of forward transmission affects how the transmissibility of the emerging variant BA.1.23 is perceived.

Point by point responses to each of the reviewers
We thank the reviewers for their thoughtful and constructive evaluations, and we have revised the manuscript to address their comments.
Of note, we added data from three additional nasopharyngeal swab specimens from patient P3 that were collected after the submission of the original manuscript. No additional BA.1.23 cases were detected in our health system or globally after the latest sequenced specimen was captured from P3, who passed 271 days after the initial identification of BA1.23. We believe that our precision surveillance approach has helped to limit the spread of the BA.1.23 lineage.

Reviewer #1:
In this manuscript, Gonzalo-Reiche et al. document the case of an immunosuppressed, chronically infected SARS-CoV-2 patient (P1) over the course of 81 days along with indirect evidence of forward transmission from this individual to 3 other patients (P2, P3, and P4) and into the broader community (2 GISAID sequences from NYC). In terms of detailing viral evolutionary dynamics in chronically infected individuals, this work adds to many existing SAR-CoV-2 studies, without profoundly new insights. The real novelty of this work is that it provides sequence-derived evidence of forward transmission of a highly evolutionarily divergent viral lineage from this chronically infected individual (P1) to other individuals. This is important to demonstrate because VOCs such as Omicron and Alpha have been hypothesized to have evolved in chronically infected individuals but there has been a lack of case studies that have demonstrated that viral genotypes that have evolved in these individuals can lead to successful forward transmission.

R1.1.
While the work presented in this manuscript is important and the analyses presented appear to be largely thoroughly done, I think the work would benefit from a major reorganization. Currently, the results section begins with a detailed description of the consensus level spike gene viral substitutions that occurred in patient P1. The next section then uses those substitutions as evidence for forward transmission to patients P2-P4 and the broader community. Only then do the authors point to a phylogeny (Fig 2A) briefly, prior to detailing the evolution of the viral populations in patients P1-P4 below the consensus level.
It seems to me that reorganizing the manuscript to first present figures to conclusively demonstrate forward transmission based on whole-genome consensus sequences would be the results to start with. This would include the current Figure 2A (which would become Figure 1A). A second, time-aligned phylogeny here would also be very informative as a Figure 1B, as it would help reconstruct transmission times (based on tMRCAs/internal node times). Separate tables for P1-P4 and the 2 GISAID sequences could then list consensus-level nucleotide substitutions (ideally relative to P1 (d23), for clarity, since we don't care about the substitutions that have occurred since the Wuhan reference strain). These tables could indicate which of these substitutions were synonymous and which nonsynonymous (along with amino acid changes). Then present the below-the-consensus findings and bring those into the fold of whether those tell you anything more about the timing of transmission, or potentially bottlenecks between individuals. And then analyze the evolutionary dynamics in terms of selection and phenotypic changes that occurred (this would include both substitutions and 'mSNVs').
I'm suggesting this reorganization such that the questions of interest are sequentially addressed: • Did onward transmission occur from chronically infected patient P1?
• If yes, when did onward transmissions occur?
• Which phenotypes did the transmitted virus carry? I think most of the analyses are there in the manuscript already; there is just considerable effort needed on the part of the reviewer to piece the parts together.
Answer: We appreciate the suggestion by this reviewer, but we feel that starting the results with a presentation of all specimens and cases would not reflect the chronology of events, which included an initial stage of emergence of the BA.1.23 lineage, followed by forward transmission and further evolution of the lineage in recipients that developed additional persistent infections. We also note that none of the other reviewers have commented on the order in which the results are presented.
To address questions regarding the timing of onward transmissions, we have constructed a time-aligned phylogeny to help reconstruct transmission times, now shown as an insert on Figure 2A and displaying the estimated TMRCAs with 95% confidence intervals. We also now discuss the potential for direct or indirect contact between patients P1, P2, P3 and P4 while receiving care in our health system, based on admission/discharge/transfer records.

Other major comments:
R1.2: Some of the samples were NP samples and some were AN samples, but, with the exception of Figure  3, none of the other analyses indicate this. Could there be compartmentalization occurring between these sampling sites, such that effectively analyzing the sequence data as though they came from the same site may not be appropriate?

Answer:
We have added sample type annotations to Figure 2 to ensure that the same information is consistently represented throughout the manuscript. We found no strong evidence for compartmentalization as there is no grouping of genotypes by sample origin. However, after the addition of three new specimens for patient P3 we did note that there was a notable decrease in minor intrahost SNVs (miSNVs) and increase in consensus mutations in the AN specimen collected from P3 on day 131, compared to the earlier and later NP specimens. We have now included this information on lines 173-175.

R1.3:
Line 116-117: more metadata here, if possible, would be really helpful and strengthen the results considerably. Is there documentation that these three secondary infections (patients) had direct contact with P1? Or potentially indirect contact through use of the same room?
Answer: Based on the pattern of mutations in the viral genome we suspect the initial transmission(s) from P1 occurred in an 18-day period between day 64 and day 82. During this time, P2 (10 days overlap) and P4 (18 days overlap) were admitted to same unit (unit A) as P1. P2 and P4 stayed in neighboring rooms, which were separated by 6 rooms from P1. On day 80, P4 was transferred to another room in a different unit (unit B), which was separated by 1 room from P3. Patients P3 and P4 overlapped in unit B for two days until P4 was discharged on day 82. As such, there was a potential for direct or indirect contact between all four patients.
A potential sequence of events based on the available data is that BA.1.23 initially transmitted from P1 to P2 and/or P4 in Unit A, with possible secondary transmissions between P2 and P4. Onward transmission from P4 to P3 then took place in unit B. In this scenario, both P3 and P4 would have been infected with BA.1.23 for three weeks before their initial positive tests. While this is well beyond documented incubation times of SARS-CoV-2, we note that both patients developed persistent infections lasting between 1-4 months. As such, it is possible that the onset of persistent infection in these patients occurred earlier, during the window of potential contact. An alternative scenario is that P1 transmitted to P2 on unit A between day 64 and day 72, before P2 was discharged on day 72. P3 and P4 then had later exposures before both patients were re-admitted to the hospital.
As P3 and P4 were not tested for SARS-CoV-2 during the three-week window between potential contact and their initial positive tests, we cannot determine which scenario is the most likely. We therefore chose to provide a more limited assessment of potential patient contacts based on the unit admission history in the revised manuscript (Lines 126-138). Figure 2A, P3's day 131 consensus sequence appears to be highly divergent -even relative to that same patient's day 117 and day 112 sequences. This doesn't seem right. I know the authors argue that this is not recombination, but more analyses to make sure that the results here are accurate, and if they appear to be, how to interpret these dramatic changes would be good. I think to do this properly, one would need to infer haplotypes in the viral population based on the read data, and then look at changes in haplotype frequencies over time.

R1.4: According to
Answer: We were also intrigued by the observation of the highly divergent sequence on day 131. To confirm this was not a technical artifact, the specimen was sequenced twice. In addition, we successfully isolated the virus in tissue culture and sequenced the viral isolate. The replicate sequences from the original isolate and virus isolates (both passage 0 and passage 1) contained the same mutations at the consensus level. The viral isolates also contained a mutation observed at low frequency in the original specimen at nucleotide position 4391 (C-A). These data indicate strongly that the constellation of mutations observed in the original specimen are real and not artefacts introduced during PCR amplification or the sequencing process. (See Nextclade results below).
The lack of spanning read information to link distal mutations complicates reconstruction of haplotypes from short read sequencing data. However, we performed additional analyses to rule out recombination. First, we performed recombination analysis at the consensus level with RIPPLES [PMID: 35952714], using default parameters except that we reduced the minimum number of descendants from 10 to 3 to account for our sample size (n=24 with a maximum of 13 genomes per patient). This analysis was done using a global scale phylogeny generated with USHER from a comprehensive collection of unique public sequences from Genbank, COG-UK and the China National Center for Bioinformation (http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/). The RIPPLES analysis did not identify recombination events for P3 genomes with parsimony score improvements above 7, which are more likely to reflect true recombination events.
We also queried the combination of nucleotide substitutions observed in the sequences for P3 containing minority variants from day 46 to 131. For this we used a sliding window of 3 consecutive mutations to identify lineages carrying similar combinations in covSPECTRUM (https://cov-spectrum.org/). Search criteria included the global diversity between the period of November 2021 (emergence of Omicron) and September 2023 (last known detections of BA.1.23). From this analysis, only a handful of queries returned matches with less than 10 sequences identified worldwide. We do not believe these consecutive genotype matches to be valid because: 1) they were very rare; 2) inconsistent with recombination breakpoints, 3) found only outside the US, and/or 4) they pre-dated the appearance of the minority variants in P3 by >3 months. In summary, we did not identify contemporary lineages with mutations seen in the P3 d131 consensus sequence that could be clearly identified as a source of recombination with the BA.1.23 lineage.

Number of Sequences
Considering the progressive accumulation of intrahost minority variants over time, we consider recombination an unlikely scenario. We have added the details of this analysis to the revised manuscript in the results (lines 181 to 188).

R1.5.
One critical comment is about the availability of the sequence data. According to the Data Availability Statement, only nasal swab consensus samples have been made available (on GISAID). I assume these include NP and AN samples? For reproducibility, the deep sequencing data also need to be made available, and the accession numbers of the sequences and the SRA project numbers need to also be provided.
Answer: All RNA-seq data and relevant metadata have now been deposited to the NIH Sequence Read Archive (SRA) submission SUB12865927, under accession number SAMN33273632-SAMN33273655. The "Data Availability" section has been updated accordingly.
Minor comments: R1.6: Lines 93-106: it's unclear to me which of these identified mutations were at the consensus level versus fixed vs present at low frequencies. I think, given the reference to Figure 1, that this analysis is all consensus level.
Answer: This is correct. All mutations discussed in the above-mentioned paragraph were observed at the consensus level. We have clarified this in the text: "Over a 12-week period, we documented the accumulation of nine amino acid substitutions in the spike protein ... (Fig. 1) within the same patient at the consensus level." (lines 96-99). We also made the distinction to the number of changes outside spike clearer.

R1.7:
Line 103: The conclusion that 'two different viral populations emerged' is confusing to me if all the analyses are at the consensus level -having those three different substitutions at the consensus level doesn't necessarily indicate to me that two different viral populations emerged. To conclude this, it seems that below-the-consensus analyses would have to be done (with haplotype reconstruction).

Answer:
The reviewer is correct. From Figure 1 alone it cannot be concluded that two different viral populations emerged. This remark was to emphasize that additional mutations were observed though the remainder of the infection. We have reworded this sentence as "During the following weeks additional mutations emerged; samples from this period contained shared (L455W) as well as distinct signature mutations (E96D on day 72, S477D on day 81)." Line 102-104.

R1
.8: Lines 104-106 indicate that there were nonsynonymous (+1 synonymous) mutations outside of spike, but those results I don't see in Figure 1.

Answer:
We have now added a reference to Extended Data Figure 1 that contains the whole genome alignment where the synonymous mutations are included.

R1
.9: I am trying to wrap my head around Figure 1… These are SNVs relative to which reference? If a position goes from blue line to red line (e.g. E96D), what does that mean? Are all unlabeled SNVs synonymous and labeled nonsynonymous (I don't think this is the case, but if that's not the case, is there information in this figure that denotes which type the SNV is?) What does a bold line signify vs an unbolded line? Day 48, FCS region: is there an error in plotting, i.e. should the red line be under the red line for day 40? Based on Figure 2A, it is an error in plotting.
Answer: All comparisons are relative to the ancestral Wu-1 reference strain, with BA.1 signature mutations shown in blue and the novel mutations that emerged in patient P1 shown in red. The colorings of the alignment are indicated in the legend and in the figure itself. In the original figure, the red lines representing novel mutations were thicker than the blue lines representing the BA.1 SNVs. To avoid confusion, we have now made the thickness of the line consistent. Please note, however, that the close separation between some consecutive mutations may give the appearance of a single thicker line. In these instances, we further separated the amino acid changes at the bottom of the alignment.